201 research outputs found

    Lost in parameter space: A road map for Stacks

    Get PDF
    PublishedThis is the author accepted manuscript. The final version is available from Wiley via the DOI in this record.1.Restriction site-Associated DNA sequencing (RAD-seq) has become a widely adopted method for genotyping populations of model and non-model organisms. Generating a reliable set of loci for downstream analysis requires appropriate use of bioinformatics software, such as the program stacks. 2.Using three empirical RAD-seq datasets, we demonstrate a method for optimising a de novo assembly of loci using stacks. By iterating values of the program's main parameters and plotting resultant core metrics for visualisation, researchers can gain a much better understanding of their dataset and select an optimal set of parameters; we present the 80% rule as a generally effective method to select the core parameters for stacks. 3.Visualisation of the metrics plotted for the three RAD-seq datasets shows that they differ in the optimal parameters that should be used to maximise the amount of available biological information. We also demonstrate that building loci de novo and then integrating alignment positions is more effective than aligning raw reads directly to a reference genome. 4.Our methods will help the community in honing the analytical skills necessary to accurately assemble a RAD-seq dataset.This work was co-funded by the Environment Agency, Westcountry Rivers Trust and the University of Exeter. Overseas collaboration for the project was made possible by funding from The Genetics Society, Santander and the University of Exeter. Thank you to many RAD-seq workshop participants for invaluable insight and new ideas. We thank Dr Nicolas Rochette for his insights into parameter analysis. Thanks also to Dr Andy King for assistance with the brown trout data molecular work and analysis, and Guy Freeman and Martin Young for the species illustrations. Prof Peter Kille and Dr Luis Cunha, Cardiff School of Biosciences, Cardiff University, kindly provided the reference genome of L. rubellus

    Screening synteny blocks in pairwise genome comparisons through integer programming

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It is difficult to accurately interpret chromosomal correspondences such as true orthology and paralogy due to significant divergence of genomes from a common ancestor. Analyses are particularly problematic among lineages that have repeatedly experienced whole genome duplication (WGD) events. To compare multiple "subgenomes" derived from genome duplications, we need to relax the traditional requirements of "one-to-one" syntenic matchings of genomic regions in order to reflect "one-to-many" or more generally "many-to-many" matchings. However this relaxation may result in the identification of synteny blocks that are derived from ancient shared WGDs that are not of interest. For many downstream analyses, we need to eliminate weak, low scoring alignments from pairwise genome comparisons. Our goal is to objectively select subset of synteny blocks whose total scores are maximized while respecting the duplication history of the genomes in comparison. We call this "quota-based" screening of synteny blocks in order to appropriately fill a quota of syntenic relationships within one genome or between two genomes having WGD events.</p> <p>Results</p> <p>We have formulated the synteny block screening as an optimization problem known as "Binary Integer Programming" (BIP), which is solved using existing linear programming solvers. The computer program QUOTA-ALIGN performs this task by creating a clear objective function that maximizes the compatible set of synteny blocks under given constraints on overlaps and depths (corresponding to the duplication history in respective genomes). Such a procedure is useful for any pairwise synteny alignments, but is most useful in lineages affected by multiple WGDs, like plants or fish lineages. For example, there should be a 1:2 ploidy relationship between genome A and B if genome B had an independent WGD subsequent to the divergence of the two genomes. We show through simulations and real examples using plant genomes in the rosid superorder that the quota-based screening can eliminate ambiguous synteny blocks and focus on specific genomic evolutionary events, like the divergence of lineages (in cross-species comparisons) and the most recent WGD (in self comparisons).</p> <p>Conclusions</p> <p>The QUOTA-ALIGN algorithm screens a set of synteny blocks to retain only those compatible with a user specified ploidy relationship between two genomes. These blocks, in turn, may be used for additional downstream analyses such as identifying true orthologous regions in interspecific comparisons. There are two major contributions of QUOTA-ALIGN: 1) reducing the block screening task to a BIP problem, which is novel; 2) providing an efficient software pipeline starting from all-against-all BLAST to the screened synteny blocks with dot plot visualizations. Python codes and full documentations are publicly available <url>http://github.com/tanghaibao/quota-alignment</url>. QUOTA-ALIGN program is also integrated as a major component in SynMap <url>http://genomevolution.com/CoGe/SynMap.pl</url>, offering easier access to thousands of genomes for non-programmers.</p

    Restriction associated DNA-genotyping at multiple spatial scales in Arabidopsis lyrata reveals signatures of pathogen-mediated selection

    Get PDF
    Background: Genome scans based on outlier analyses have revolutionized detection of genes involved in adaptive processes, but reports of some forms of selection, such as balancing selection, are still limited. It is unclear whether high throughput genotyping approaches for identification of single nucleotide polymorphisms have sufficient power to detect modes of selection expected to result in reduced genetic differentiation among populations. In this study, we used Arabidopsis lyrata to investigate whether signatures of balancing selection can be detected based on genomic smoothing of Restriction Associated DNA sequencing (RAD-seq) data. We compared how different sampling approaches (both within and between subspecies) and different background levels of polymorphism (inbreeding or outcrossing populations) affected the ability to detect genomic regions showing key signatures of balancing selection, specifically elevated polymorphism, reduced differentiation and shifts towards intermediate allele frequencies. We then tested whether candidate genes associated with disease resistance (R-gene analogs) were detected more frequently in these regions compared to other regions of the genome. Results: We found that genomic regions showing elevated polymorphism contained a significantly higher density of R-gene analogs predicted to be under pathogen-mediated selection than regions of non-elevated polymorphism, and that many of these also showed evidence for an intermediate site-frequency spectrum based on Tajima’s D. However, we found few genomic regions that showed both elevated polymorphism and reduced FST among populations, despite strong background levels of genetic differentiation among populations. This suggests either insufficient power to detect the reduced population structure predicted for genes under balancing selection using sparsely distributed RAD markers, or that other forms of diversifying selection are more common for the R-gene analogs tested. Conclusions: Genome scans based on a small number of individuals sampled from a wide range of populations were sufficient to confirm the relative scarcity of signatures of balancing selection across the genome, but also identified new potential disease resistance candidates within genomic regions showing signatures of balancing selection that would be strong candidates for further sequencing efforts

    QTL analysis and genomic selection using RADseq derived markers in Sitka spruce: the potential utility of within family data

    Get PDF
    Sitka spruce (Picea sitchensis (Bong.) Carr) is the most common commercial plantation species in Britain and a breeding programme based on traditional lines has been in operation since the early 1960s. Rotation lengths of 40-years have led breeders to adopt a process of indirect selection at younger ages based on traits well correlated with final selection, but still the generation interval is unlikely to reduce much below twenty years. Recent successful developments with genomic selection in animal breeding have led tree breeders to consider the application of this technology. In this study a RAD sequence assay was developed as a means of investigating the potential of molecular breeding in a non-model species. DNA was extracted from nearly 500 clonally replicated trees growing in a single full-sibling family at one site in Britain. The technique proved successful in identifying 132 QTLs for 5-year bud-burst and 2 QTLs for 6-year height. In addition, the accuracy of predicting phenotypes by genomic selection was strikingly high at 0.62 and 0.59 respectively. Sensitivity analysis with 200 offspring found only a slight fall in correlation values (0.54 and 0.38) although when the training population reduced to 50 offspring predictive values fell further (0.33 and 0.25). This proved an encouraging first investigation into the potential use of genomic selection in the breeding of Sitka spruce. The authors investigate how problems associated with effective population size and linkage disequilibrium can be avoided and suggest a practical way of incorporating genomic selection into a dynamic breeding programme

    Mapping the sex determination locus in the hāpuku (Polyprion oxygeneios) using ddRAD sequencing

    Get PDF
    Background&nbsp; Hāpuku (Polyprion oxygeneios) is a member of the wreckfish family (Polyprionidae) and is highly regarded as a food fish. Although adults grow relatively slowly, juveniles exhibit low feed conversion ratios and can reach market size in 1&ndash;2 years, makingP. oxygeneiosa strong candidate for aquaculture. However, they can take over 5years to reach sexual maturity in captivity and are not externally sexually dimorphic, complicating many aspects of broodstock management. Understanding the sex determination system ofP. oxygeneiosand developing accurate assays to assign genetic sex will contribute significantly towards its full-scale commercialisation.&nbsp; Results&nbsp; DNA from parents and sexed offspring (n = 57) from a single family of captive bredP. oxygeneioswas used as a template for double digestion Restriction-site Associated DNA (ddRAD) sequencing. Two libraries were constructed usingSbfI&ndash;SphI andSbfI &ndash;NcoI restriction enzyme combinations, respectively. Two runs on an Illumina MiSeq platform generated 70,266,464 raw reads, identifying 19,669 RAD loci. A combined sex linkage map (1367cM) was constructed based on 1575 Single Nucleotide Polymorphism (SNP) markers that resolved into 35 linkage groups. Sex-specific linkage maps were of similar size (1132 and 1168cM for male and female maps respectively). A single major sex-determining locus, found to be heterogametic in males, was mapped to linkage group 14. Several markers were found to be in strong linkage disequilibrium with the sex-determining locus. Allele-specific PCR assays were developed for two of these markers, SphI6331 and SphI8298, and demonstrated to accurately differentiate sex in progeny within the same pedigree. Comparative genomic analyses indicated that many of the linkage groups within theP. oxygeneiosmap share a relatively high degree of homology with those published for the European seabass (Dicentrarchus labrax).&nbsp; Conclusion&nbsp; P. oxygeneioshas an XX/XY sex determination system. Evaluation of allele-specific PCR assays, based on the two SNP markers most closely associated with phenotypic sex, indicates that a simple molecular assay for sexingP. oxygeneiosshould be readily attainable. The high degree of synteny observed withD. labraxshould aid further molecular genetic study and exploitation of hāpuku as a food fish

    Rapid niche expansion by selection on functional genomic variation after ecosystem recovery

    Get PDF
    It is well recognized that environmental degradation caused by human activities can result in dramatic losses of species and diversity. However, comparatively little is known about the ability of biodiversity to re-emerge following ecosystem recovery. Here, we show that a European whitefish subspecies, the gangfisch Coregonus lavaretus macrophthalmus, rapidly increased its ecologically functional diversity following the restoration of Lake Constance after anthropogenic eutrophication. In fewer than ten generations, gangfisch evolved a greater range of gill raker numbers (GRNs) to utilize a broader ecological niche. A sparse genetic architecture underlies this variation in GRN. Several co-expressed gene modules and genes showing signals of positive selection were associated with GRN and body shape. These were enriched for biological pathways related to trophic niche expansion in fishes. Our findings demonstrate the potential of functional diversity to expand following habitat restoration, given a fortuitous combination of genetic architecture, genetic diversity and selection

    Comparative Oncogenomic Analysis of Copy Number Alterations in Human and Zebrafish Tumors Enables Cancer Driver Discovery

    Get PDF
    The identification of cancer drivers is a major goal of current cancer research. Finding driver genes within large chromosomal events is especially challenging because such alterations encompass many genes. Previously, we demonstrated that zebrafish malignant peripheral nerve sheath tumors (MPNSTs) are highly aneuploid, much like human tumors. In this study, we examined 147 zebrafish MPNSTs by massively parallel sequencing and identified both large and focal copy number alterations (CNAs). Given the low degree of conserved synteny between fish and mammals, we reasoned that comparative analyses of CNAs from fish versus human MPNSTs would enable elimination of a large proportion of passenger mutations, especially on large CNAs. We established a list of orthologous genes between human and zebrafish, which includes approximately two-thirds of human protein-coding genes. For the subset of these genes found in human MPNST CNAs, only one quarter of their orthologues were co-gained or co-lost in zebrafish, dramatically narrowing the list of candidate cancer drivers for both focal and large CNAs. We conclude that zebrafish-human comparative analysis represents a powerful, and broadly applicable, tool to enrich for evolutionarily conserved cancer drivers.Kathy and Curt Marble Cancer Research FundArthur C. MerrillNational Institutes of Health (U.S.) (Grant CA106416)National Institutes of Health (U.S.) (Grant ROI RR020833)National Institutes of Health (U.S.) (Grant 1F32GM095213-01

    Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species

    Get PDF
    The ability to efficiently and accurately determine genotypes is a keystone technology in modern genetics, crucial to studies ranging from clinical diagnostics, to genotype-phenotype association, to reconstruction of ancestry and the detection of selection. To date, high capacity, low cost genotyping has been largely achieved via “SNP chip” microarray-based platforms which require substantial prior knowledge of both genome sequence and variability, and once designed are suitable only for those targeted variable nucleotide sites. This method introduces substantial ascertainment bias and inherently precludes detection of rare or population-specific variants, a major source of information for both population history and genotype-phenotype association. Recent developments in reduced-representation genome sequencing experiments on massively parallel sequencers (commonly referred to as RAD-tag or RADseq) have brought direct sequencing to the problem of population genotyping, but increased cost and procedural and analytical complexity have limited their widespread adoption. Here, we describe a complete laboratory protocol, including a custom combinatorial indexing method, and accompanying software tools to facilitate genotyping across large numbers (hundreds or more) of individuals for a range of markers (hundreds to hundreds of thousands). Our method requires no prior genomic knowledge and achieves per-site and per-individual costs below that of current SNP chip technology, while requiring similar hands-on time investment, comparable amounts of input DNA, and downstream analysis times on the order of hours. Finally, we provide empirical results from the application of this method to both genotyping in a laboratory cross and in wild populations. Because of its flexibility, this modified RADseq approach promises to be applicable to a diversity of biological questions in a wide range of organisms
    corecore